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ABSTRACT 



Solar twins have been a focus of attention for more than a decade, because their structure is extremely close to that of the Sun. Today, 
thanks to high-precision spectrometers, it is possible to use asteroseismology to probe their interiors. Our goal is to use time series 
obtained from the HARPS spectrometer to extract the oscillation frequencies of 18 Sco, the brightest solar twin. We used the tools 
of spectral analysis to estimate these quantities. We estimate 52 frequencies using an MCMC algorithm. After examination of their 
probability densities and comparison with results from direct MAP optimization, we obtain a minimal set of 21 reliable modes. The 
identification of each pulsation mode is straightforwardly accomplished by comparing to the well-established solar pulsation modes. 
We also derived some basic seismic indicators using these values. These results offer a good basis to start a detailed seismic analysis 
of 18 Sco using stellar models. 

Key words. Stars: individual: 18 Sco - Stars: oscillations - Techniques: radial velocities - Methods: data analysis 
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1. Introduction 

In the field of stellar physics, the study of solar twins has recently 
received growing attention. Since the term was first coined by 
Cayrel de Strobel et al. (1981) to designate stars spectroscopi- 
cally identical to the Sun, they have been the focus of photomet- 
ric and spectroscopic studies aiming at measuring their funda- 
mental atmospheric parameters (Gustafsson 1998; Melendez & 
Ramirez 2007; Gustafsson 2008; Melendez et al. 2010). On the 
one hand there has been an on-going race to find the "best" solar 
twin. On the other, samples of such stars have been used to try 
to answer statistically the question: is the Sun a peculiar star? 

Nowadays, thanks to various technical breakthroughs in the 
field of observational astrophysics, these founding studies can be 
supplemented by additional measurements. Spectropolarimetric 
surveys of the solar twins using NARVAL have allowed ob- 
servers to detect magnetic fields in several objects and to re- 
construct their magnetic topology (Petit et al. 2008). More re- 
cently, Bazot et al. (2011, hereafter Paper I) combined inter- 
ferometric and asteroseismic measurements, from respectively 
the PAVO beam combiner at the CHARA array and the high- 
precision spectrograph HARPS at La Silla Observatory, to esti- 
mate the linear and acoustic radii of 18 Sco, the brightest solar 



twin, and to derive its mass. The method used to determine the 
acoustic radius relied on the use of the autocorrelation of the 
radial-velocity time series. This paper is the continuation of this 
study, which now aims at a detailed analysis of the seismic data. 

Asteroseismology measures quantities directly sensitive to 
the stellar interiors. This is an advantage compared to the classi- 
cal observable quantities, usually obtained from spectroscopy or 
photometry, which are sensitive to the poorly-modelled external 
layers of the stars. In contrast, it is possible to extract informa- 
tion from the seismic signal roughly independent of these layers 
and more robustly described by the existing stellar codes. 

Following the development of high-precision spectrographs 
(mostly for the search of extrasolar planets) and the space mis- 
sions CoRoT and Kepler, seismic data have been more and more 
frequently used to model stars in general, and main sequence 
sun-like stars in particular (e.g., Miglio & Montalban 2005; 
Bazot et al. 2005, 2008; Dogan et al. 2010; Metcalfe et al. 2010; 
Brandao et al. 2011). The best-case scenario from the perspec- 
tive of detailed modelling is to have access to the eigenfrequen- 
cies of the stellar pulsation modes and to their characteristic 
numbers n,l,m . They are then combined to effectively obtain 



'Based on observations collected at the European Organisation for 
Astronomical Research in the Southern Hemisphere, Chile (run ID: 
183.D-0729(A)) 



'With n the number of nodes of the eigenfunction to the stellar pul- 
sation equations and /, m corresponding to the angular degree and the 
azimuthal order of the eigenfunction, related to the orthogonal set of 
spherical harmonics Yj", which are solutions to Laplace's equation. 



1 



M. Bazot et al.: Estimating the p-mode frequencies of the solar twin 18 Sco 




0.05 



4 6 

Time (days) 

Fig. 1. Time series of radial velocities (upper panel) and their 
uncertainties (lower panel) from HARPS observations of 18 Sco. 
A few points deviating strongly from the bulk of the time series 
lie outside the plotted range. 



surface-independent information. In the following, we present 
the strategy used to determine these frequencies. 

In Section 2 we discuss the data and return to some of the 
sampling issues evoked in Paper I. Before proceeding to the fre- 
quency analysis we give an overview of the characteristics of 
the noise affecting the seismic signal in Section 3. We recall 
the characteristics of the parametric model used to describe the 
power spectrum in Section 4. We define an inverse problem for 
the estimation of the parameters and cast it into a Bayesian for- 
mulation. We then solve it numerically using a Markov Chain 
Monte Carlo (MCMC) algorithm. Our strategy is to first, test 
the methodology using simulated time-series and then to apply 
it to the real data. In Section 5, we discuss several aspects of our 
results. From a methodological standpoint, we try to assess the 
robustness of our MCMC strategy by comparing it to another es- 
timation procedure. We also discuss the choice of our parametric 
model and the priors we used in our Bayesian formulation. From 
a physical point of view, we measure the impact of our new esti- 
mates on the acoustic radius and the stellar mass. 



2. Data 

The time series was described in Paper I and is shown in Fig. 1 in 
its unfiltered version. We collected 2833 radial velocity measure- 
ments over 12 nights from 10 to 21 May 2009. In this section, we 
would like to draw the attention on some of the consequences of 
the window function w(t) = £ 5(t - f„), with t„ the median time 
of the n-th exposure. 

The observed signal can be written y(t) = y(t)w(t), with y(f) 
the continuous time series. When the sampling is uniform, the 
definition of the Nyquist frequency is vn = l/(2Af), with At 
the sampling time. In the case of unevenly sampled time series, 
there is, strictly speaking, no Nyquist frequency. However, this 
does not mean that there is no spectral folding and equivalent 
frequencies have been suggested (see e.g. Bretthorst 2000). 

The power spectrum of 18 Sco is shown in Fig. 2. The dif- 
ferences with the figure of Paper I come from the filtering of 
the time series 2 . The inset shows the spectral window, that is the 
squared modulus of the Fourier transform of w{t). Its maximum 
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Fig. 2. Power spectrum of the radial velocity of 18 Sco esti- 
mated using a Lomb-Scargle weighted periodogram. The ver- 
tical dashed line marks the equivalent Nyquist frequency calcu- 
lated using the median exposure time. The inset shows the power 
spectrum of the window function w{t), normalized to its maxi- 
mum. 

is at (around which also stand the daily aliases, not visible here, 
see Paper I), and two prominent peaks at +7.4 mHz. They are 
the aliases caused by the sampling. This means that any signal 
present at frequencies around 3.7 mHz is likely to be folded. It 
should, however, be emphasized that this is not a clear-cut limit, 
nor an exact estimate; therefore, it is not possible to assess firmly 
whether some modes at higher frequencies perturb the spectrum 
below 3.7 Hz or, for that matter, if we are missing some genuine 
modes above this limit. 

It turns out that the median observing time, med(Af), offers a 
reasonably good estimator of the upper limit for folding. This is 
expected because the sampling is close to uniform. From night 
to night, the mean values of the observing time lie in the range 
132 - 172 s and their standard deviations within 2 - 232 s (the 
upper value resulting mostly from one cloudy night, otherwise, 
the standard deviations lie in the range 2 - 60 s, the remaining 
discrepancy being explained by longer exposures during some 
of the nights). Furthermore, the median is almost unaffected by 
the daily gaps. Defining our "equivalent Nyquist frequency" as 

(i) 



2med(Af) ' 



we obtain an indicative value of 3.7 mHz, in good agreement 
with the general shape of the spectrum. 

Such a low value of v£ is also a problem when estimating 
the various noise contributions in the power spectrum. In partic- 
ular, it makes it difficult to address the question of the photon 
noise level, which is the main source of noise at high frequen- 
cies, i.e. few mHz above the p-mode envelope (see Fig. 3). A 
common, straightforward approach consists of estimating it by 
simply calculating an average of the amplitude spectrum well 
above the p-mode region; this cannot be done here because of 
the spectral folding. 



3. Noise 

Before considering the extraction of p-mode oscillation frequen- 
cies, we discuss briefly the impact of the different noise sources 



2 And from a small numerical bug in the weighting scheme of the 
LS spectrum in Paper I. The lower amplitude observed in Fig 2 should 



be considered as the reference. This does not affect the results from 
Paper I. 
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Table 1. Parameters for the best fit to the smoothed spectrum 
with uncertainties. SG and G stand respectively for the super- 
granulation and granulation components. 
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Fig. 3. Power spectrum of the radial velocity of 18 Sco repre- 
sented in logarithmic units. The contributions to the noise from 
granulation (G), supergranulation (SG) and from a third Harvey- 
like component are represented as dashed lines. The photon 
noise is a dot-dashed line. The p mode component of the spec- 
trum appears as a dotted line. The full red line shows their sum. 
The black dots are the points of a binned power spectrum that 
has been effectively used to perform the fit. 



on the relevant signal. It is customary to consider three main phe- 
nomena contributing to the overall noise: activity, granulation 
(at different scales) and instrumental noise (which we assumed 
to be close to the photon noise, however see Dumusque et al. 
2011). Characteristic times for activity typically scale as the ro- 
tational period. In the case of 18 Sco, a 22.7-d rotation period 
has been measured from spectropolarimetry (Petit et al. 2008). 
A 12-d time series is certainly not enough to capture this kind of 
signal. We therefore ignore this contribution. 

Harvey (1985) suggested modelling the contributions to 
the power spectrum coming from granulation as Lorentzian, 
4ct 2 t/[1 +(2ttvt) 2 ], with t the characteristic timescale of the pro- 
cess and cr the velocity rms. Such functions are representative of 
non-oscillatory velocity fields whose autocorrelations decay ex- 
ponentially with time. However, it is difficult to detect this kind 
of signal precisely enough to assess accurately which of these 
phenomena contribute effectively to the low-frequency signal. 
In fact, the exact dependency of these noise components on fre- 
quency is unclear and still subject to discussion (e.g. Guenther 
et al. 2008). Note that it is also possible to use simpler scal- 
ing laws of the form v~ 2 to model them. In all cases, following 
Harvey (1985), we should warn that these models are crude. 

In seismic studies, it is customary to use parametrized 
"Harvey functions", Jf?(v) = a/[l + (Ijivrf], in order to im- 
prove the fit to these low-frequency components. After several 
trials we found that we needed to use three such functions. The 
photon noise is assumed to be white. The contribution to the 
power spectrum of the p modes is modelled as a Lorentzian enve- 
lope (Dumusque et al. 201 1). Figure 3 shows the power spectrum 
corresponding to our time series in logarithmic units. It has been 
computed using a weighted Lomb-Scargle periodogram (Lomb 
1976; Scargle 1982; Zechmeister & Kiirster 2009). It was evalu- 
ated at frequencies separated by l/T, with T the total observing 
time. To allow comparison with previous studies, it has been nor- 
malized according to Kjeldsen et al. (2005). The best fit to the 
spectrum of this composite model is also shown. For stability, 
the fit to the background was performed on a heavily smoothed 
version of the spectrum in order to retain only the slowly-varying 





T(S) 


a (m 2 s 2 /yuHz) 




1 (SG) 
2 

3(G) 


6642 ± 438 
1051 ± 34 
389 ±5 


(7.2 ± 1.1) x 1(T 2 
(6.8 ± 0.5) x 10~ 2 
(5.7 ± 0.1) x 10- 4 


5.1 ±0.3 
16.9 ± 1.3 
2.47 ± 0.05 


Power (m 2 s 2 /yuHz) 


Photon noise 




(6.46 ± 0.16) x 10" 5 





features. These values, because of the low "Nyquist" frequency, 
should be considered with care. In particular, it is much easier to 
perform this fit when the photon-noise constant component can 
be fixed separately using the signal-free high-frequency regions 
of the power spectrum 3 . 

The results of our fit are given in Table 1, which displays 
the parameters of the Harvey functions and the average photon 
noise. In order to estimate uncertainties on the fitted parameters, 
we assumed that the binned points in the smoothed power spec- 
trum obey Gaussian statistics, whose variances were evaluated 
using the rms in each bin. We then used a simple Monte Carlo 
experiment with 10000 simulated spectra to obtain the uncer- 
tainties displayed in Table 1. 

The Harvey functions are often interpreted as representing 
different scales of surface convective motions. In the case of 
granulation, the time scale agrees well with those found in the 
solar case by Harvey (1985) and Lefebvre et al. (2008). It is pos- 
sible that the the larger time-scale phenomena correspond to su- 
pergranulation, which has been well-studied in the solar case. 
We however note that the value returned for t is lower by one 
or two orders of magnitude than results found in the by Harvey 
(1985); Title et al. (1989); De Rosa et al. (2000) and Shine et al. 
(2000). It is not possible with such short time series to deter- 
mine whether it is an intrinsic difference between 18 Sco and the 
Sun or if this is a methodology-induced effect. The intermediate 
scale is sometimes interpreted as mesogranulation (Dumusque 
et al. 201 1), but its very existence is subject to debate in the so- 
lar case (see Nordlund et al. 2009, and references therein). This 
debate being outside the scope of this paper, we limit ourselves 
to mention this intermediate scale without attributing it to sur- 
face convection or to an instrumental artifact in the data. 



4. Modelling of the signal 

4. 1 . The spectrum model 

In sun-like stars, p modes are excited by turbulent convective 
motions. Meanwhile, each mode is damped (see e.g Houdek 
et al. 1999). It is assumed that the number of excitation events 
per damping time is large. The central-limit theorem then en- 
sures that the amplitude distributions of the modes converge 
towards normal ones (Foglizzo 1998), which translates into a 
power spectrum that is exponentially distributed. We make the 
additional assumption that the frequency bins in the Fourier 
space are independent. In this case, the density probability den- 
sity of the power spectrum at frequency v, is given by 



3 In fact, the only region of the power spectrum that seems relatively 
free from stellar granulation noise is the narrow range 1800-2200 /iHz, 
where the average noise is 2.6 cm s~ 2 (against 2.4 cm s~ 2 for the fitted 
photon noise). 
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f(p(vd) = 



exp 



p(vd 



(2) 



Ideally, if one obtains many independent measurements of 
the stellar power spectrum, it will tend, at each v,, towards the ex- 
pectation value &(vi). The form of this function must be speci- 
fied from our knowledge of the physical processes governing the 
mode excitation and the various sources of noise. The expected 
value of the power spectrum corresponding to one oscillation 
mode k, &k(.Vi), was derived theoretically by Anderson et al. 
(1990). Considering the equation for a simple damped oscillator 
with random forcing, they showed that ^(v,) has the form of 
a Lorentzian centred at the eigenfrequency of the mode. If 
we make the additional assumption that the modes are uncorre- 
cted, which is supported by the observations in the solar case 
(Foglizzo et al. 1998), the expectation value for the multi-modal 
power spectrum can be written in the form 



^(y,) 



^"(Vf) + , 



VtH k _ 

2 

i,k 



\W(y)\ 2 



(3) 



Here, the Vk represent the visibilities of the modes, Hf, their 
heights, and = 2(v, - v^/T^, with 1^ the linewidth of the 
Lorentzian. It is related to the mode lifetime through the relation 
r^. = 1 /7tt>, with Tj the lifetime of mode k. These parameters are 
real-valued, ^"(v,) is the power density for all the modes that 
are not taken into account in the sum (mostly I = 4 and 5 modes, 
but also unidentified modes with / < 3; Fletcher et al. 2009) but 
still contribute to the overall power, and SS(yi) the noise con- 
tribution, using a v~ 2 scaling law to describe the low-frequency 
component. Note that the convolution by |W(v)| 2 (W(v) being 
the Fourier transform of the window function) in Eq. (3) is only 
justified because we are considering the expectation value of the 
power of a stochastic function (Deeming 1975), hence assuming 
our time series are purely stochastic functions. A more rigorous 
approach can be found in Stahn & Gizon (2008) but was not 
implemented here. 

The model is characterized by a parameter vector 8 e 
{{6k}i<k<K,Oag,6,0s>}, which we must to estimate. Here B k e 
{vk,^k,Hk) and has to be estimated for K modes 4 (the Vk can 
be computed straightforwardly provided some assumptions are 
made on the mode degree). The vectors Bgg and Bg» regroup the 
parameters describing respectively 3§ (see Sect. 3) and SP' . The 
inverse problem is now set, as we need to provide estimates of 
8 being given y = (y\, . . . ,yti). This estimation problem, when 
applied to helio- or asteroseismic data is sometimes referred to 
as "peak bagging". 

A classical approach, first used in helioseismology and then 
adopted in asteroseismology, involves estimating the parame- 
ters of @*{y) by minimizing the corresponding negative log- 
likelihood function (Anderson et al. 1990) 



-ln(L(0)) = - J] ln/(p(v,;0)) = £ 



p(Vi) 



+ In &>(vi) 



, (4) 



where we used the notation /(/?(v,;0)) = f(p(vj)) to indicate the 
dependency on the model parameters 5 . 



4 This number has to be provided by some ansatz. 

5 Using the (frequentist) Maximum Likelihood (ML) method, one 
defines a Maximum Likelihood Estimator (MLE) as the value of that 
minimizes —L(ff) 



MLE(0) = argmin(- ln(L(0))). 



4.2. Bayesian formulation of the problem 

Studying L(B) can be an challenging task. For likelihoods that 
are highly non-linear in the parameters, the chance is high that 
they possess multiple maxima. This makes the identification of 
high-likelihood regions equivocal and hence complicates the pa- 
rameter estimation. Furthermore, if the dimension of the space 
of parameters is large, serious algorithmic problems might oc- 
cur. In particular, it becomes difficult to even locate the maxima 
of L{6) because one cannot sample efficiently the space of pa- 
rameters. 

A possible approach to this problem consists of using the 
Bayes' formula 

n{8\y) oc n(8)L(B), (5) 

where n(6\y) is the Posterior Probability Density (PPD) of the 
parameters for fixed data y and n{8) is the prior density on the 
parameters. We can therefore explicitly include the knowledge 
we already have on B in the PPD, so that n(B\y) becomes the 
quantity of interest instead of L(B). If the prior is adequately 
chosen it might significantly restrain the volume to sample in the 
space of parameters. This potentially offers two advantages: (i) 
from the statistical point of view, by reducing the number of lo- 
cal maxima, (ii) numerically, some algorithms being more likely 
to converge when the volume in the parameter space decreases. 
For general reviews on the Bayesian methodology, we refer to 
monographs by Gregory (2005) or Robert (2007). 

This approach has become very popular in the case of aster- 
oseismology of sun-like stars. The main reason is that realistic 
priors on the frequency distribution of the pulsation modes can 
be obtained by using the theoretical asymptotic distribution of 
the p modes 6 . 

Bayesian methodology applied to asteroseismology and how 
it might improve the fitting of the data has been discussed by, 
e.g., Brewer et al. (2007) and Gaulme et al. (2009). In the fol- 
lowing, we will mostly focus on the specific setups used in our 
algorithms, with a particular emphasis on the priors included in 
our probabilistic models. 

4.3. Markov Chain Monte Carlo estimation 

A widespread approach to Bayesian estimation consists in using 
Markov Chain Monte Carlo (MCMC) algorithms to approximate 
the PPD of the parameters. 

The (stochastic) sampling in the parameter space relies on 
the convergence properties of Markov Chains which, under cer- 
tain circumstances, generate realizations of a random variable 
according to a stationary distribution. The underlying idea of 
MCMC algorithms is thus to produce a Markov Chains whose 
stationary distribution is the target PPD (Robert & Casella 
1999). MCMC algorithms are a class of numerical methods that 
aim at approximating PPDs. One of their features is that one can 
sample a complex distribution using much simpler ones (often 
called instrumental laws). 

The base used here is a Metropolis-Hastings algorithm 
(Metropolis 1953; Hastings 1970). Because the complexity of 
the sampling increases with the dimension of the parameter 



The asymptotic relation in the limit of high frequencies (high or- 
ders) and low degrees was given to successive orders of approxima- 
tion by Vandakurov (1967) and Tassoul (1980) and can be written as 
v n ,i = (» + 1/2 + e)Av + 0(v _1 ), with Av the average large separation, 
i.e. the inverse of twice the stellar acoustic radius, and e a phase-related 
constant. 
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space, an additional scheme was added in order to set the instru- 
mental law used for the sampling. This was done in a burn-in 
sequence, during which the chain converges towards its station- 
ary distribution 7 . The algorithm also allows one to run multi- 
ple parallel chains in order to use tempering, which is useful to 
prevent the Markov Chains becoming stuck in local minima. A 
complete description of the algorithm was given by Handberg & 
Campante (2011). 

4.3.1 . Priors and algorithmic setup 

The prior density on the frequency is certainly the most char- 
acteristic feature of asteroseismology of sun-like star. We first 
assume that the components Bss and 9 go, of 6 are independent 
from the other parameters. We fix them before the estimation of 
the rest of the parameters. This assumption might affect the esti- 
mation of the frequency of some low-amplitude mode. However, 
it improves significantly the convergence of our algorithm. We 
can reduce the prior probability density to 

=tt(v,T,H), (6) 

where we used the simplifying notation v = (vj , . . . , v#), T = 

(Ti r*)and# = (#1,..., H k ). 

In practice, we recast the problem of estimating the K 
linewidths F^ to the estimation of only two parameters by as- 
suming a linear dependency of the linewidth on the frequency. 
We considered only the values of the linewidth F t and F 2 at 
2800 //Hz and 3600 //Hz respectively. From theoretical mod- 
elling and observational results in the Sun, it is clear that this is 
an oversimplification. However, it is difficult to suggest a proper 
empirical law and retaining a linear model should at least cap- 
ture the overall increasing trend of the linewidth with frequency 
(Houdek et al. 1999), although it might not be even be strictly 
monotonic (Chaplin et al. 2005). The relevant parameter space 
is now {v, Ti,r2,H}. Assuming that the v k , H and T 2 are all in- 
dependent parameters, we can write the second term in (6) as 



where 



n(v, T, H) = n(H\v, H , T 2 )n(v, H , T 2 ) 



(7) 



Mode heights have been fixed as follows. We first smoothed 
the power spectrum 8 , according to Kjeldsen et al. (2008). To 
each v k we can thus associate a value for the amplitude, A k , 
which is then converted to height using the relation H k = 
2A 2 k l{nT k ) (Chaplin et al. 2008). The first term in the right-hand 
side of Eq. (7) is therefore 



n(H\v,Y u T 2 ) = y\5(H k 



k=\ 



2A| 



)■ 



(8) 



For the frequencies and the mode lifetimes we chose uniform 
distributions 9 on the regions where the parameters are allowed to 
vary. The prior density simplifies to 



2A\ 



n{0) = \\ i ak {v k )6{H k - — i) \\ i 6 .(ro, 



(9) 



k=\ 



1=1 



7 This burn-in sequence is then discarded when estimating the PPD. 

8 This procedure might, of course, depend slightly on the filter used. 
We neglect this contribution. 

'Such distributions sometimes enter in the quite generic category 
of uninformative priors. Considering that we severely restrict our indi- 
vidual modes to vary in a small portion of the frequency domain, this 
seems hardly to apply here. . . 



1 if v € a, 
if via. 



(10) 



The intervals a k were chosen to have widths of 12 //Hz and be 
centred on the initial guesses. These were selected by multiply- 
ing by 0.991 the solar frequencies from Broomhall et al. (2009, 
see Sect. 4.3.4 for a discussion). This ensured that the daily 
aliases at + 1 1 .57 //Hz of each modes should not perturb the fit. 
Moreover, provided that the small separations (see Sect. 5.3) are 
large enough, this should also limit the possibilities of "mode 
swapping" for close peaks. We consider frequencies only in the 
range 1965-3700 //Hz. The total number of frequencies was fixed 
at K — 52. This prior effectively filters out the signal that is not 
included in windows centred on our first guesses for the frequen- 
cies. For the linewidths, we set b\ — b 2 — [0, 5] //Hz. 

This is essentially the same approach as used for the study of 
Procyon with the same algorithm in Bedding et al. (2010). One 
has to keep in mind that these are strong assumptions, especially 
the one on the mode heights. This has to be taken into consid- 
eration when comparing the results to other methodologies in 
Sect. 5. 

The simulation made use of 6 parallel chains for tempering. 
After removal of a ~250 000 sample burn-in sequence, we ob- 
tained an approximate PPD for 6 based on ~2 000000 samples. 

4.3.2. Statistical analysis 

The MCMC simulation provides us directly with the marginal 
probability densities of our parameters 10 . We can thus calculate 
various statistics that describe these densities. For each of them, 
we considered the posterior mode and the posterior median. 

The corresponding 100(1 — rj)% credible sets for the modes 
was defined as the smallest set containing the parameter with 
probability 1 > 1 - rj > 



C 



mode 



{v k € a k \n{v k \y) > q(rf)}, 



(11) 



with q the smallest constant such as P(C™ de [y) > 1 - 77. This 
credible set always include the mode 1 1 . 

For the medians, the credible 100/3% sets were defined as 
the intervals for which the parameter is higher or lower to these 
quantities with equal probability, fi/2 with 1 > (3 > 0. We com- 
puted them using using the cumulative distribution function 12 
(cdf) 

cr dmn = in € a k \ l -±P > cdf(nW > (12) 



The corresponding credible intervals are simply the upper 
and lower values of the corresponding set. We now discuss the 
relevance of these statistics using our MCMC algorithm on sim- 
ulated time series. 



10 The marginal PPD of a given parameter is simply the PPD inte- 
grated over all the other 



n(6,\y) 



(0\y)de l ...d9 i _ l dd i+1 ...d8 N 



"The marginal distribution itself was estimated from the MCMC 
sample using a kernel density estimation algorithm. 
12 The cdf is obtained directly from the MCMC output 



5 



M. Bazot et al.: Estimating the p-mode frequencies of the solar twin 18 Sco 



4.3.3. Results for simulated time series 

Evaluating the performance of estimation methods against arti- 
ficial data is common practice. Of course, we need simulations 
realistic enough to ensure the conclusions can be transposed to 
the real case. 18 Sco offers a good opportunity to carry out such 
an exercise. Indeed, its similarity to the Sun suggests that we 
can obtain satisfying mock data. To that effect, we simply need 
to start from a solar seismic model, which, with respect to aster- 
oseismic standards, is extremely well-known, and scale its pa- 
rameters accordingly. 

In this section, we use only one artificial time series, in par- 
ticular because of the MCMC algorithm is time-consuming. The 
artificial time series were constructed using the BiSON solar fre- 
quencies. They were multiplied by the ratio of 18 Sco average 
large separation to the solar one (=; 0.991). The mode heights 
and lifetimes were assumed to be solar. These parameters were 
given as input to the solarFLAG simulator (Jimenez-Reyes et al. 
2008) to produce evenly sampled time series. These data were 
subsequently interpolated to the observing times of the 18 Sco 
data. 

An additional heuristic adjustment has been made in order 
to produce a realistic data set. We scaled the amplitude of the 
time series by a constant factor and added to each point the re- 
alization of a Gaussian random variable to simulate the photon 
noise. This was done by setting the noise level in the frequency 
space and then converting to the time space using the relation 
0" p hot = NA(v)/tt, with (Tphot the scatter due to the photon noise, 
N the length of the time series, and A(v) the photon noise level 
as estimated from the amplitude spectrum. The scaling constant 
and the variance of the underlying noise distribution were ad- 
justed to reproduce roughly the signal-to-noise ratio observed 
in the amplitude spectrum of 18 Sco 13 . We crudely fixed these 
values to 0.95 and 0.4 cm/s, so that the artificial signal-to-noise 
ratio is 6.46, compared to the 6.65 value measured for the ob- 
servations. This gives a maximum amplitude and a noise level in 
the simulated spectrum within 5% of the observed ones. 

We note that such a photon noise level corresponds to an 
average amplitude of 2.7 cm/s (or 8.3 X 10" 5 m 2 s~ 5 /juHz) in the 
region 1800-2200 fiH. This is similar to the observed value, for 
which we assumed that only photon noise was present in this 
interval (Sect. 3). This might be an a posteriori warning that this 
assumption is inadequate. It might also mean that neglecting the 
term 3^'(v) in Eq. (3) is not valid. The time series was analyzed 
using the MCMC algorithm set as described above. 

Table 2 gives the results of our numerical experiment. It dis- 
plays the mode and median estimates of the frequencies and their 
corresponding credible intervals. We also listed the input fre- 
quencies to the time series simulator. These results might guide 
us as to which summary statistic to use. The median capturing 
the central tendency of the distribution, it might often lead to 
a better fit to the data than the mode of the marginal posterior. 
However, we notice that 35% of the input frequencies are con- 
tained in the credible intervals for the modes, whereas only 30% 
of them are in the median credible intervals. The mode credible 
intervals are in general larger than those computed for the me- 
dian, and quite often the former encompass the latter. These low 
values might indicate that we underestimate the uncertainties in 
both cases. On the other hand, if we define an average absolute 
error as Yjk l^ - v™ put \/K with v^, k — 1, . . . , K, the estimated fre- 

13 Defined for practical purpose as the ratio of the amplitude maxi- 
mum in the 1500-3700 yuHz region to the averaged amplitude over the 
1 800-2200 yuHz interval. 
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Fig. 4. Echelle diagram for 18 Sco. The blue dots represent 
the frequencies given in Table 3. The 68.3% credible interval 
are also represented. The black circles show the scaled BiSON 
frequencies for the Sun (their best-fitting model, Table 1 in 
Broomhall et al. 2009). To plot this echelle diagram, we used 
the value of the large separation given in Paper I. 



quencies, we find a larger value for the modes (1.76 //Hz) than 
for the medians (1.31 //Hz). This indicates that the median is 
indeed, on average, more accurate than the mode. 

A closer look at the marginal densities could give some 
explanation to this. We often observe very skewed or multi- 
modal marginal posterior densities for the eigenfrequencies of 
our model. It is especially true at low and high frequency, i.e. 
for low and high radial orders. The median is a better estimator 
in ~66% of the case, almost all corresponding to this kind of 
difficult situations. Therefore, and considering that the credible 
intervals are conservative enough for both statistics, we opted for 
the estimator leading to the most accurate and precise values, i.e. 
the median. 

It should also be noted that, at low frequencies, some den- 
sities are very strongly peaked around a central eigenfrequency. 
This translates into very narrow credible intervals that seem to 
be unrealistic. This can be explained by the fact that these pul- 
sation modes have long lifetimes and are thus unresolved by our 
time series. Thus, they appear in the spectrum as Dirac functions 
and are not properly modelled by a Lorentzian. They should be 
considered carefully. At the very least the credible intervals as- 
sociated with these particular frequencies are meaningless. 

4.3.4. Mode identification 

The problem of identification, i.e. associating a couple (n, I) to 
the frequencies 14 detected in the time series, can be extremely 
difficult to solve. The standard way to proceed consists in com- 
paring the observations to the theoretical asymptotic formula 
(see note 6). However, there are many instances of seismic stud- 



Rotation is neglected, therefore in = 0. 
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Table 2. Results from the benchmarking of the estimation algorithms. For the right and left tables, columns 1 and 2 give the orders 
and degrees of the simulated modes. Column 3 gives the input frequencies to the time series simulator (Jimenez-Reyes et al. 2008, 
only the frequencies we effectively searched for are displayed, more modes with lower amplitude were included in the simulated 
time series). Column 4 and 5 display the frequencies obtained from our MCMC algorithm using the posterior median (Col. 4) and 
the mode (Col. 5) estimates alongside with the upper and lower limits of the corresponding credible interval. Column 6 shows 
the frequencies estimated with the MAP direct-optimization methodology, alongside uncertainties derived by inversing the Hessian 
matrix. 



/ 


n 


Input 


Median 


Mode 


MAP 


I 


n 


Input 


Median 


Mode 


MAP 



14 2074.60 2072.33!°^ 2072.00+|g 2074.39 ± 0.39 2 13 2063.21 2061.33!^° 2060.73!°;™ 2064.18 ±0.43 

15 2208.52 2207.75!°°]] 2207.78+^ 2208.11 ±0.18 2 14 2197.53 2196.48+°;* 2196.54+^ 2197.01 ±0.14 

16 2341.62 2341.60!°f} 2341.76!°^ 2341.98 ±0.42 2 15 2331.10 2332.79!°;^ 2332.96+°;^ 2330.55 ± 0.34 

17 2473.77 2474.77 + °;g 2474.77+.};^ 2473.56 ±0.33 2 16 2463.53 2463.47+^ 2 2463.57+] : ^ 2463.76 ±0.53 

18 2606.07 2606.27 + °$ 2606.71+.^°, 2606.10 ±0.08 2 17 2596.15 2595.29+°,;'°, 2595.64+°;^ 2596.00 ± 0.08 

19 2739.37 2738.42 + °j* 2738.22+^ 2738.57 ±0.16 2 18 2729.73 2726.75+°;^ 2726.94+°;™ 2727.57 ±0.17 

20 2872.82 2872.56!°^ 2871.87+^ 2872.38 ±0.13 2 19 2863.47 2861.20+°;^ 2861.00+°;g 2863.65 ±0.15 

21 3006.50 3002.07^2 3001.1 1+^, ^ 3005.58 ±0.26 2 20 2997.50 2996.55!°;*° 2996.97+°;^ 2997.59 ±0.24 

22 3140.17 3141.63^j° 3141.70+°^ 3140.86 ± 1.18 2 21 3131.46 3131.08+°;]* 3131.11+°;^ 3131.55 ±0.39 

23 3273.89 3275.11^ 3275.05+° ^ 3274.67 ± 0.73 2 22 3265.46 3263.80+°;^ 3264.08+°;^ 3264.55 ± 0.52 

24 3408.19 3406.99!°^ 3406.70+^ 3407.34 ±0.28 2 23 3400.03 3398.69+°;^ 3398.88+°;^ 3399.34 ±0.46 

25 3542.70 3542.89+.° g 3543.37+^ 3540.86 ± 1.27 2 24 3534.80 3530.31+°;^ 3529.53+i;| 3531.10 ±2.02 

26 3677.54 3674.49+.°;™ 3674.42+];^ 3675.50 ± 1.88 2 25 3669.86 3667.57+];^ 3665.18+^ 3665.32 ± 2.00 

13 2002.44 2001.50+°;°' 2001.51+°;^ 2001.96 ±0.26 3 12 1983.00 1984.31+];^ 1985.79+^2 1983.74 ±0.48 

14 2137.33 2137.00!°™ 2136.62+^ 2137.33 ±0.43 3 13 2118.42 2115.46+^ 2113.83+™° 21 18.57 ±0.38 

15 2271.23 2271.32+.°;^ 2271.52+°;]' 2271.37 ±0.35 3 14 2252.78 2251.42+];^ 2250.00!°™ 2254.28 ± 0.68 

16 2403.73 2403.16+°;^ 2403.211°;^ 2403.89 ± 0.39 3 15 2385.91 2385.77+°;^ 2386.00+^ 2386.54 ± 0.82 

17 2536.27 2540.48!°:$ 2541.21+°;^ 2536.61 ±0.16 3 16 2518.85 2519.01+.';°° 2518.67+^ 2519.77 ±0.18 

18 2669.12 2669.39!°;* 2669.41!°;^ 2669.92 ±0.21 3 17 2652.01 2649.85!°|] 2649.73+°;^ 2650.13 ±0.22 

19 2802.69 2802.13!°;]] 2802. 14!°;2« 2803.04 ±0.18 3 18 2786.05 2789.10!°;^ 2789.09+^ 2787.33 ± 0.23 

20 2936.64 2936.45+.°;]^ 2936.42!° ^ 2937.02 ±0.18 3 19 2920.37 2923.79!° * 2924.46+.';'° 2921.05 ±0.47 

21 3070.29 3072.04!] ^ 3073.84!] * 3074.67 ± 0.54 3 20 3054.48 3054.85+j™ 3056.51!° ™ 3057.32 ± 0.62 

22 3204.20 3202.73!°;™ 3202.78!°;^ 3203.02 ±0.51 3 21 3188.85 3187.85!] ^ 3187.58+; ^ 3184.63 ±1.26 

23 3338.27 3338.60!°;^ 3338.49!°;*] 3339.05 ±0.31 3 22 3323.28 3324.29!];^ 3323.05 +4™ 3322.36 ±0.46 

24 3472.67 3471.80!°^ 3472.38!°^ 3472.51 ± 1.13 3 23 3458.15 3457.42!];^ 3456.20!) f 3456.01 ± 1.20 

25 3607.63 3604.87^ 3603.29!°;^ 3605.10 ±2.22 3 24 3593.50 3595.80!];* 3596.28+|;g 3587.87 ±2.49 



ies, both from space and ground-based, whose outcome led to 
uncertainty on the identification (Martic et al. 2004; Carrier & 
Bourban 2003; Kjeldsen et al. 2005; Appourchaux et al. 2008; 
Benomar et al. 2009; Bedding et al. 2010). This of course may be 
pathological when confronted with single-site ground-based ob- 
servations. This is problematic from the modelling standpoint, 
because if one wants to compare observed and theoretical fre- 
quencies (or, see Sect.5.3, rational functions of the v„;), one 
needs to be able to identify precisely the observed modes. 

We are thus in a very particular position when studying 
18 Sco, in which we can confirm a posteriori the orders and 
degrees we assigned for each mode. If we make the assumption 
that in a close vicinity of the Sun the surface related terms in the 
asymptotic relation do not vary significantly, we can consider 
that the individual frequencies scale roughly as the average large 
separation, i.e. as the mean density of the star. 

Such a situation is expected if the stars are close to homolo- 
gous (which in fact is never really the case for main-sequence 
stars). This is somewhat justified by the findings of Paper I, 
giving a ratio for the large separations Av Q /Avi8Sco ~ 1.0007. 
Therefore, we simply assign the (n, I) by comparing directly the 
frequencies of 18 Sco to the scaled solar ones as suggested by 
Bedding & Kjeldsen (2010). A graphical check is sufficient to 



do so, as seen in Fig. 4, in which are plotted the solar modes 
obtained from BiSON (Broomhall et al. 2009). 

4.3.5. Observed frequencies 

We analyzed the observed time series using the method de- 
scribed in Sect. 4.3. 1. For the 52 fitted modes, we found credible 
intervals consistent with those estimated from the simulated time 
series. Given the comments made above, we are confident in 
their robustness. In some cases we obtained very narrow credible 
sets, whose upper and lower limits, relative to the median value, 
are < 0.1 //Hz in absolute value. These are always high ampli- 
tude modes. In Table 3, we give the medians of the marginal 
densities alongside the corresponding 68.3% credible regions. 
They are shown in Fig. 4. 

The marginal probability densities are qualitatively in good 
agreement with the results from the simulations. However more 
of them exhibit multiple maxima, perhaps because of the lower 
photon noise level used in the simulations. This is always prob- 
lematic when using the median, since it might bias the estima- 
tion towards maxima not corresponding to the real frequency. 
We flagged these modes in Table 3. Note, however, that the pro- 
cess remains subjective, and we considered only the densities for 
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Table 3. Oscillation frequencies detected for 18 Sco along with their radial order n and degree I. The quantities given here are 
the median of the marginal posterior distributions for each eigenfrequency considered in our model. The corresponding credible 
intervals are given alongside. We use some flags to signal estimates requiring additional caution. (*) indicate eigenfrequencies with 
clear multimodal marginal distributions, (t) pulsation modes potentially unresolved, ($) pulsation modes that have not passed a 
hypothesis testing when using the heights estimated with the MAP approach in Sect. 5.1.2 and (?) the mode that has not been 
identified by the MAP approach. 



/ = 



/ = 1 



/ = 2 



/ = 3 



12 














1985. 12!^ 


(t) 


13 






2001.51!°;^ 


(t) 


2062.78!°,;°^ 


(t) 


2117.89!°|° 


(*) 


14 


2074.4 i:j;°j> 


(t) 


2140.38!°,$ 


(t) 


2198.63!°;^ 


(*) 


2253.00!|^ 




15 


2208.791'$ 


(*,*) 


2271.66!J;{| 


(*) 


2331.20!|;^ 


(*) 


2383.34^ 


(*,t) 


16 


2342.00!°;^ 




2403.96!°;^ 


(*) 


2464.271^° 


(*) 


2518.43!°;^ 




17 


2473.63+j;™ 


(*) 


2536.58!°$ 




2595.27!°;^ 


(*,*) 


2650.84!°^ 




18 


2607.31!°;^ 


(*) 


2668.72!°'^ 




2730.13!°;™ 




2786.75*5 




19 


2740.27!°^ 


(*) 


2802.67!°^ 




2863.20!°;^ 


(*) 


2924.12!°ji 




20 


2873.78+^' 


(*) 


2936.02!°g 




2994.18!g°i 


(*) 


3055.39!°;^ 


(*) 


21 


3005.34!^| 


(*,*) 


3071.49!°,;^ 




3132.89!°,$ 




3187.21!|;g 


(*) 


22 


3140.10!°* 




3202.06!°;^ 


(*) 


3263.98!°;^ 




3322.77!°;^ 




23 


3275.80!°;^ 


(*) 


3338.91!°;^ 




3397.52!°^ 




3458.25!};^ 


m 


24 


3408.53!°;^ 


(*) 


3475.34!°;^ 


(*) 


3531.08!J;« 
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3595.931!;°,° 


(« 


25 


3545.96+°$ 


(*,*) 


3605.76!°;^ 




3667.94!°^ 








26 


3674.80!°;^ 

















which the secondary maxima were obvious. These modes can 
be used in subsequent studies, but one should keep in mind that 
the summary statistics we used do not capture all the features 
of the distribution. The average width of the 68.3% credible re- 
gions on the median is slightly smaller than the one observed for 
the simulated time series, but this should not prove significant. 
Compared to the simulations, we also found more skewed dis- 
tributions. This reinforces our choice to use the median for the 
statistical summary. 

We again observe some very sharply peaked marginal den- 
sities. Such shapes lead to very narrow and unrealistic credible 
regions, for instance much narrower than has been found from 
space missions with much longer observing baselines (see e.g. 
Gaulme et al. 2009; Mathur et al. 2010; Campante et al. 2011). 
They appear at low frequencies where the mode lifetimes are 
known to be longer, and we can thus express doubts about the 
resolution of modes (0,14), (1,13), (1,14), and (2,13), which are 
flagged accordingly in Table 3. They are all located at low fre- 
quencies, v < 2150 //Hz. This give an approximate limit above 
which the modes start to be resolved. In the case of modes (1,13) 
and (1,14), the kernel estimations of the densities return a numer- 
ical error. This might indeed confirm that they cannot be approx- 
imated by a continuous density, which would be characteristic of 
an unresolved pulsation mode. These modes might very well be 
real. However, because the Lorentzian model becomes incorrect 
for unresolved modes, the associated uncertainties are likely to 
be widely underestimated. 

Finally, and anticipating the discussion below, some modes, 
when estimated with an alternative strategy which does not fix 
the heights (see Sect. 5) did not pass an hypothesis testing. We 
flagged them and do not recommend their use in subsequent 
studies. 

The results of our MCMC simulation are available at www . 
astro . up . pt/ -bazot/ data/18ScoII/. 



5. Discussion 

5. 1 . Comparison to a MAP approach 

Our goal in this section is to compare the performance of our 
MCMC approach with another Bayesian strategy based on the 
direct optimization of the PPD. Note that this is a very general 
comparison, since not only do we change the a posteriori esti- 
mators (median and maximum of the PPD), but we also modify 
our probabilistic model, i.e. the priors. We chiefly want to get 
an idea of how consistent they might be. This is good procedure 
to cross-check results obtained using different methodologies. 
In the case of 18 Sco, given the relatively difficult nature of the 
data, we see this as necessary. 

In the Maximum A Posteriori (MAP) approach, the likeli- 
hood is replaced by the PPD. The estimator for the model pa- 
rameters becomes 

MAP(0) = argmax(7r(0|;y)). (13) 
e 

This estimator is sometimes called the regularized likelihood. 
This is a common strategy and has been used in the case of the 
Sun (Chaplin et al. 2002; Broomhall et al. 2009) and stars ob- 
served from satellites (Appourchaux et al. 2008; Deheuvels & 
Michel 2010). Here, the regularized likelihood is maximized us- 
ing a Powell algorithm. 

5.1.1. Setup and tests 

An interesting feature of this direct optimization approach is 
that it converges somewhat more easily than our MCMC algo- 
rithm. Therefore, we were able to use less constraining priors. 
The mode lifetimes and heights were both left free to vary. We 
applied on the frequencies priors close to those described by 
Eq. (9). The main difference is that, instead of considering in- 
dividual frequencies, we considered pairs of frequencies of sim- 
ilar parities and differing by one radial order. The were set to 
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Table 4. Estimated frequencies of 18 Sco from the MAP method using direct optimization. The associated uncertainties were derived 
from the inversion of the Hessian matrix. 



_n /_=0 1=1 1 = 2 / = 3 

12 1984.32 ±0.68 

13 2002.00 ±0.26 2063.13 ±0.16 2118.01 ±0.10 

14 2074.88 ±0.17 2140.60 ±0.12 2199.47 ±0.39 2254.95 ± 0.24 

15 2209.49 ± 0.19 2270.54 ±0.17 2330.74 ± 0.20 2385.35 ±2.17 

16 2339.14 ± 0.18 2404.25 ± 0.49 2467.29 ±0.78 2519.03 ±0.44 

17 2475.62 ±0.67 2536.84 ±0.11 2595.50 ±0.15 2651.51 ±1.24 

18 2605.46 ± 0.15 2669.69 ± 0.97 2730.41 ± 1.06 2786.96 ± 0.20 

19 2741.34 ± 1.54 2803.55 ±0.11 2863.74±0.17 2918.07 ±1.40 

20 2874.13 ±0.20 2936.92 ± 0.69 2994.29 ± 1.18 3055.77 ±0.61 

21 3006.00 ± 1.35 3071.97 ±0.48 3133.10 ±0.43 3186.07 ±1.53 

22 3140.58 ±0.40 3203.11 ±0.95 3264.55 ± 0.57 3322.89 ±0.40 

23 3276.19 ± 0.68 3339.44 ±0.37 3398.24 ±0.52 3459.01 ± 2.76 

24 3408.74 ± 0.60 3476.24 ± 1.81 3545.86 ± 3.58 3593.56 ±2.68 

25 3556.15 ±3.92 3610.70 ±2.32 3669.81 ±2.20 

26 3679.63 ±2.51 



±22 //Hz above the I = (or 1) and below the I = 2 (or 3) modes 
(Fletcher et al. 2009). 

It should be noted that the uncertainties in the MAP frame- 
work are estimated by inverting the Hessian matrix of the pa- 
rameters. This is well-justified if the errors, and their second 
derivatives with respect to the parameters, are small and if these 
derivatives are not much correlated with the errors. Moreover, 
the posed fitting problem must also be well-constrained, other- 
wise the formal uncertainties will also be a poor representation 
of the true uncertainties. In our case, given the relative complex- 
ity of the probability distributions we consider (as can be seen 
from the MCMC results), these assumptions are not likely to 
hold. We also note that, from one mode to the other, the Hessian- 
derived and MCMC uncertainties might differ significantly from 
the MCMC estimates. 

Another issue with the MAP estimation approach is the po- 
tential instability of the minimization algorithm with respect to 
the initial guesses. To test this, we varied randomly the first- 
guess frequencies - using a top-hat distribution of width ±3 fjHz 
- and the final frequencies were median estimates over 1000 
such fits. We found that the scatter in the estimate over these 
1000 fits is similar to or larger than the Hessian-derived uncer- 
tainties. It was the possibility of deriving more realistic uncer- 
tainties that ultimately led us to choose the MCMC estimates as 
a reference. 

We tested our MAP algorithm using the artificial data de- 
scribed in Sect. 4.3.3. The results are given in Table 2, alongside 
those from the MCMC approach. The estimated frequencies are 
extremely close for the two methods. Nevertheless, the MAP al- 
gorithm performs better in terms of accuracy, with the average 
absolute error being lower than from the MCMC estimates val- 
ues. 

This opens the door to questions with respect to the proper 
use of priors in Bayesian analysis. Indeed, if this relative lack 
of accuracy in the MCMC approach is caused by a bias intro- 
duced by the stringent prior constraints imposed, it means that 
those included in our MAP setup are better. However, this is a 
difficult problem, numerically, to assume such a great variation 
of the mode lifetimes and heights in the MCMC approach. This 
shows how delicate it is to choose between different Bayesian 
methodologies. Note however that for sufficiently long and pre- 
cise measurements the two approaches should converge. 



5.1.2. Estimates from the observations 

Applying the MAP algorithm to the real data, we found results, 
given in Table 4 for reference, similar to those of the MCMC ap- 
proach for the frequency estimates. The uncertainties are consis- 
tent with our tests using simulated time series. There is only one 
inconsistency in the identification between the two approaches. 
The (I = 2, n = 24) mode was identified as (/ = 0, n = 25) in 
the MCMC framework. It is likely that the (I = 0, n = 25) fre- 
quency returned from optimization is in fact an alias of the real 
mode. However, these are very low-amplitude peaks, rejected in 
the hypothesis testing (see below). 

It should also be noted that the greater "flexibility" we have 
in terms of convergence has allowed us to estimate the mode 
heights and the lifetimes. This allowed us to perform a posteri- 
ori some hypothesis testing for each mode (Appourchaux et al. 
2009). It is somewhat more satisfying to carry out such tests on 
genuinely estimated heights rather than on fixed heights derived 
from a filtered spectrum, such as the ones used for our MCMC 
simulations. Therefore, in Table 3, we flag the values with posi- 
tive hypothesis testing results. This way, one can chose or not to 
include them when using the list of frequencies. Note that only 
40 frequencies were incompatible with the Hq (null) hypothesis, 
which tests here the hypothesis that the peak is due to the noise 
in the data. 

5.2. Comparison with time-domain modelling 

Another approach commonly used in asteroseismology involves 
representing the signal in the time domain 

M 

y(t n ) = ^ c k sm(2nv k t n ) + d k cos(2jrv k t„) + e n , (14) 

k=\ 

where (c k , d k ) and v k are the amplitudes and frequencies of the 
pulsation modes and e n , the noise. 

Comparing with model (3), some shortcomings of model 
(14) are clear. In particular, it does not take into account the 
fact that the modes have finite lifetimes. This may lead to an 
over-fitting of the signal in the vicinity of some modes, i.e. sev- 
eral sine functions being required to reproduce what is actually 
the dual of a Lorentzian. However, this effect clearly depends 
on the ratio of the characteristic damping time to the length of 
the time series. If it is large, then the chances are high that the 
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mode will be unresolved, the power excess largely being con- 
fined to one frequency bin (see Sect. 5). In this case, model (14) 
will be accurate enough. The unresolved-mode assumption has 
often been made in the case of ground-based seismic observa- 
tions (Kjeldsen et al. 1995; Bouchy & Carrier 2001; Bouchy 
et al. 2005; Bazot et al. 2007; Kjeldsen et al. 2005; Bedding 
et al. 2010). Because of its simplicity, interesting methodologies 
can be applied to the problem of estimating its parameters. We 
consider two here. The first one is the well-known CLEAN al- 
gorithm (Gray & Desikachary 1973; Roberts et al. 1987), used 
for iterative deconvolution, and the second one is the SparSpec 
algorithm, a penalization approach to minimization in the con- 
text of spectral analysis (Bourguignon et al. 2007). We used both 
methods to make sure that the results discussed below are not 
due to algorithmic artifact. 

The objective of this section is to compare our results with 
simulations in order to understand the impact of our choice for 
a physical model (time-domain representation against frequency 
domain representation) for the power spectrum. We also try to 
understand, at least qualitatively if this is more important than 
our choice for the priors on the parameters included in our prob- 
abilistic description. We explain how, in the case of 18 Sco, a 
frequency-domain representation is an improvement for gapped 
and irregularly sampled time series, for which models such as 
Eq. (14) have previously been used. 

5.2.1 . Performances of the methods 

We used a sample of 100 artificial time series constructed as de- 
scribed in Sect. 4.3.3. They only differ by the realizations of the 
low-frequency and white noises. We applied the MAP, CLEAN 
and SparSpec algorithms to each element of this sample. The 
MAP setup is similar to the one described in Sect. 5.1.1. In the 
case of CLEAN, we limited our search to the 1500-3700 //Hz 
region. We set a threshold for the detection at three times the 
noise level in the 1800-2200 //Hz interval, that is 7.2 cm.s -1 . 
The relevant parameter for SparSpec is the penalization factor 
(Bourguignon et al. 2007), which we empirically set to 0.34, so 
that the results are close to those obtained for CLEAN. 

Note that we did not include the MCMC algorithm in this 
comparison. This should not be a problem since we have seen 
that the results from the direct MAP optimization and the 
MCMC agree well. Our main goal is to evaluate the efficiency 
of our algorithm, i.e. how the estimated frequencies reproduce 
the input frequencies to the time series simulator. We are not 
concerned with the uncertainties on the parameters here, which 
were the main reason to retain the MCMC estimates as our ref- 
erence. Therefore, since we have seen in Sect. 5.1 that the MAP 
and MCMC approach lead to close enough estimates, one can 
extrapolate the following discussion to the MCMC case. 

In the case of CLEAN and SparSpec, it is not possible to go 
through the normal adjustment of the outputs of the algorithms 
for the 100 realizations of the time series 15 . We can only obtain 
crude estimates of how many times each input frequency is actu- 
ally detected. This can be done by looking at Fig. 5, which repre- 
sents the histograms of the 100 outputs for each method. These 
are upper limits to the rate of detection of the time-series sim- 
ulator input frequencies, mostly because of the multiple peaks 
sometimes necessary to describe a single mode. 



15 These adjustments consist in removing manually the peaks that are 
obviously due to noise or aliases not properly removed by the algorithm. 
In a sense, this is very much similar to applying some prior knowledge 
one would have on the frequencies, but after the estimation process. 
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Fig. 5. Histograms of the output frequencies from the MAP, 
CLEAN (with and without prior filtering) and SparSpec algo- 
rithms for 100 artificial time series. The vertical green lines mark 
the input frequencies to the time-series simulator. 



The two methods lead to very similar results. This indicates 
that the model we used is the main factor determining the out- 
come of the estimation process. Both methodologies are obvi- 
ously very sensitive to the amplitude of the mode. Only in the 
2800-3400 //Hz region do these algorithms find the input fre- 
quency ~50% of the time within a frequency interval corre- 
sponding to twice the natural resolution 6 = 1/T. This percent- 
age, drops strongly below and above these values. 

In the MAP case, the main factor affecting the efficiency of 
the algorithm is the mode lifetime. More precisely, at higher fre- 
quencies, when the mode is most likely resolved, the proportion 
of correct detections decreases. It is in the range 60%-100% for 
frequencies between 1980 //Hz and ~3400 //Hz. It drops signif- 
icantly for eigenfrequencies above ~3400 //Hz. In any case, the 
detection rates are much higher than those observed for CLEAN 
and SparSpec. This is the reflection of the fact that we used dif- 
ferent physical models to describe the signal. It is somewhat in 
disagreement with the findings of White et al. (2010). 

5.2.2. Impact of the prior formulation 

To further understand how the priors affect the results, we tried 
to combined our CLEAN algorithm with constraints similar to 
those described by our prior on the frequencies. It should be 
noted that algorithms such as CLEAN have not been designed 
with a Bayesian perspective in mind (see for instance Schniter 
et al. 2009). Therefore, we could only try to mimic the impact of 
the prior. To this effect, we retained the idea that the frequency 
prior acts in analogy like a bandpass filter, which removes all 
signal outside the top-hat functions. We therefore applied such a 
filter (a sum of bandpass filters) to our spectrum. We then used 
the CLEAN algorithm to search only for two frequencies per 
individual bandpass filter. In a sense, this strategy is very sim- 
ilar to the "ridge search" approach used for r\ Boo by Kjeldsen 
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et al. (1995). The corresponding output histogram is displayed in 
Figure 5. The information we get from this test is, of course, only 
qualitative, but it gives an interesting picture of the performance 
of the two algorithms under constraints that are fairly similar. 

We performed this test on the same 100 time-series sam- 
ple. We can see that this definitely enhances the performance 
of the CLEAN algorithm. The detection rate increases every- 
where, particularly in the low-frequency regions of the spectrum. 
In the high-frequency regions, the situation also improves, but 
the model is subject to limitations concerning the mode life- 
times, which perturb the estimation. However, the overall per- 
formance remains largely inferior to the outcome of the MAP 
strategy, which uses model (3). This is very revealing as to the 
effect the priors have on the final frequency estimates. It is often 
contended that Bayesian analysis may use too strong priors and 
retrieve only what as been defined in n(6) before the estimation. 
This is not entirely the case here. The priors on the frequencies 
we used are not so strong that any algorithm will be able to per- 
form equally well under such a constraint. This result illustrates 
the subtle interplay between the numerical and statistical advan- 
tages of the Bayesian method mentioned in Sect. 4.2. Not only 
do the priors tighten the relevant volume in the space of parame- 
ters, but they also stabilize the fit to the data when using a more 
complex but also more accurate model, involving a larger num- 
ber of parameters (higher dimension of the parameter space). 

This is not the first time that Bayesian methods have been 
used on time series with such short time baseline (Brewer et al. 
2007, for instance). However, the very favourable case of 18 Sco 
allows us to contend that, provided the frequency priors are accu- 
rate enough, direct fits to the power spectrum are more efficient 
than classical time-domain modelling. A further step would be 
to test this claim with more sophisticated models for the spec- 
trum (Stahn & Gizon 2008) or the time series (Brewer & Stello 
2009). 

5.3. Large and small separations 

Two common seismic indicators are the large and small separa- 
tions, defined respectively by Av/(n) = v n+ \j-v„j and 6vij+2(ri) = 
v n ,i - v„-\j+2- They both stem from a first analysis of the asymp- 
totic relation for p modes. Their use has been popularized by 
the fact that they are supposed to be relatively free of the un- 
known surface effects affecting the oscillation frequencies. This 
however is only partially true and other combination of frequen- 
cies have been suggested in the literature (e.g., Roxburgh & 
Vorontsov 2003; Cunha & Metcalfe 2007). We nevertheless limit 
ourselves to these two quantities, which are plotted in Fig. 6. 

It is also interesting to compute the average values of the 
large and small separations. In order to estimate them, we use 
the MCMC samples. They are convenient to study densities of 
averages because, if we assume that the central-limit theorem 
roughly applies, we can expect to deal with Gaussian distribution 
(Benomar, private communication). This greatly simplifies the 
subsequent statistical analysis. 

To estimate the average large separation we retained only the 
unflagged modes in Table 3. Basing ourselves on the asymptotic 
relation, we consider that, for a fixed degree, the frequency is a 
linear function of the mode order, with slope the average large 
separation. We thus computed the derivative of v{n) at each order 
for each degree and averaged over both quantities. The resulting 
distribution is well approximated by a Gaussian and we obtained 
(Avo,2) = 133.8 ± 0.2 yuHz. This value agree with the estimate of 
Paper I, <Av , 2 > = 134.4 ± 0.3 //Hz, at 2cr level. 
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Fig. 6. Individual large (lower panel) and small (upper panel) 
separations for 18 Sco. The circles (o) corresponds to / = 0, 
the diamonds (0) to I = 1, the squares (□) to / = 2 and the 
triangles (A) to I = 3. Filled symbols mark the combinations 
using unflagged frequencies in Table 3. A few error bars do not 
appear because they are smaller than their respective symbols. 

This sheds a new light on the mass estimate we gave in 
Paper I. Using these new values for the average large separa- 
tion and the interferometric radius, one might evaluate the mass 
of 18 Sco to be 1.01 ± 0.03 M , which brings it even "closer" to 
the Sun. Although this agrees within the l<x error bars with the 
value of Paper I, it is relevant for modelling if one is to use on 
of these masses estimates in order to, for instance, apply a prior 
on the mass when modelling 18 Sco in the Bayesian framework 
(Bazot et al. 2008). 

The average small separations are slightly more problem- 
atic. This is due to the fact that fewer values are available to 
average over, in particular for <5vo2- We thus decided to include 
the frequencies from Table 3 flagged with a (*). Even though 
they display the multiple maxima, our idea is that the most 
prominent peaks in the distributions will be the main contrib- 
utors to the density of the averaged value. We indeed found 
Gaussian distributions for (Svq^) and {6v\$) (the latter being 
much better approximated by such a distribution than the for- 
mer). Using their first moments, we get {6vq2) = 9.4 + 0.9 juHz 
and <(5vi, 3 > = 16.7 ± 0.8 //Hz. 

It is known that the small separations depend on frequency. 
However, to the first order, they can be approximated by con- 
stants 5vo j2 (n) — 6Dq =s 3(5vi3/5, with Dq an integral containing 
the derivative of the sound speed (Gough 1986; Gabriel 1989). 
The estimated values lead to a ratio (6vo^}/(6v\^) = 0.57, in 
good agreement with the theoretical expectations. 

6. Conclusion 

We presented a detailed analysis of the ground-based seismic 
data obtained for the solar twin 18 Sco from the high-precision 
spectrograph HARPS. The sampling of the time series causes se- 
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rious problems for stellar eigenfrequency estimation. We chose 
to use an MCMC algorithm in order to estimate the frequencies 
of 52 stellar pulsation modes. A careful examination of the pos- 
terior probability densities for each of them show that at least 
21 are reliable and at least 19 others are worth consideration, 
even though the corresponding marginal PPDs are more diffi- 
cult to analyze. 11 were rejected after comparison with a the 
MAP direct optimization methodology. We were able to esti- 
mate Bayesian credible intervals for these modes which reflect 
with some robustness the uncertainties of our data. 

By comparing with other estimation methods, we have dis- 
cussed how reliable are the priors we used for the estimation. On 
the one hand, they are constraining enough to allow us to use a 
(relatively) realistic model. On the other hand, they are not so 
restrictive that they would impede a proper estimation. We note 
that our methodology can be further improved by increasing the 
efficiency of our MCMC algorithm (which would allow to relax 
further the priors on the parameters) and/or by using even more 
accurate models (which may require more conservative priors). 

The individual eigenfrequencies obtained for 18 Sco allowed 
us to study some basic seismic estimators, including the large 
separations, whose estimation of the average value was ad- 
dressed earlier in Paper I. The two values agree at 2cr level, with 
the new one being lower. We derived a new value for the mass 
of the star slightly lower than the previous one. It remains to see 
how much this might affect the modelling of the star. 

The sampling issues of our data certainly played an impor- 
tant part in producing the differences observed between the var- 
ious methods used in our study. A next step would be to observe 
this star with more than one telescope, even though its magnitude 
makes such a task challenging for most of the ground-based in- 
struments now available. Furthermore, we noticed that the length 
of the time series might imply that we only resolve a fraction of 
the detected modes. This could be resolved with a longer time 
basis. 
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